What's in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation
نویسندگان
چکیده
Domain adaptation is an active field of research in statistical machine translation (SMT), but so far most work has ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle the impact of genre and topic differences on translation quality by introducing a new data set that has controlled topic and genre distributions. In addition, we perform a detailed analysis showing that differences across topics only explain to a limited degree translation performance differences across genres, and that genre-specific errors are more attributable to model coverage than to suboptimal scoring of translation candidates.
منابع مشابه
What’s in a Domain? Analyzing Genre and Topic Differences in SMT
Domain adaptation is an active field of research in statistical machine translation (SMT), but so far most work has ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle the impact of genre and topic differences on translation quality by introducing a new data set that has controlled topic and genre distributions. In addition, we perform a d...
متن کاملContextual Modeling for Meeting Translation Using Unsupervised Word Sense Disambiguation
In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistica...
متن کاملSelection-Based Language Model for Domain Adaptation using Topic Modeling
This paper introduces a selection-based LM using topic modeling for the purpose of domain adaptation which is often required in Statistical Machine Translation. The performance of this selection-based LM slightly outperforms the state-of-theart Moore-Lewis LM by 1.0% for EN-ES and 0.7% for ES-EN in terms of BLEU. The performance gain in terms of perplexity was 8% over the Moore-Lewis LM and 17%...
متن کاملTranslation Model Adaptation Using Genre-Revealing Text Features
Research in domain adaptation for statistical machine translation (SMT) has resulted in various approaches that adapt system components to specific translation tasks. The concept of a domain, however, is not precisely defined, and most approaches rely on provenance information or manual subcorpus labels, while genre differences have not been addressed explicitly. Motivated by the large translat...
متن کاملAdaptation in Machine Translation
Machine translation remains one of the grand challenge problems of natural language processing. Recent advances in the field have led to a number of applications demonstrating the potential and impact of the technology. Statistical machine translation (SMT) has emerged as the currently most promising approach to tackle the translation problem. During the last decade, it advanced to solidly outp...
متن کامل